Wan AI Video Generator

Wan AI is an advanced and powerful visual generation model developed by Tongyi Lab. It can generate videos based on text, images and other control signals. The Wan 2.2 series models are now fully open-source after Wan 2.1.

Wan Video AI Video Generator

Wan 2.1

Open Source

Advanced open-source video generation model with exceptional quality and versatility. Perfect for professional content creation.

Text to Video Example

See how Wan 2.1 transforms text into stunning videos

PromptInput text description

A couple in formal evening attire is caught in heavy rain on their way home, holding a black umbrella. In the flat shot, the man is wearing a black suit and the woman is wearing a white long dress. They walk slowly in the rain, and the rain drips down the umbrella. The camera moves smoothly with their steps, showing their elegant posture in the rain.

ResultGenerated video
Output

Key Features

  • High-quality video generation
  • Text-to-video & Image-to-video
  • Open source availability
HOT

Wan 2.2

Open Source

Experience the next generation of Wan AI video generator with enhanced quality, precise control, and creative possibilities.

Wan AI Video Generation

Key Features

🎨
Advanced Control

Precise control over video generation

High Performance

Optimized processing speed

Quality Output

Superior video quality

🔄
Versatile Input

Multiple input types

NEW

Wan 2.5

An AI generation tool with native multimodal architecture, featuring core breakthroughs in "10-second audio-visual synchronization + 4K cinematic quality" that transcends the previous generation from "pure image generation" to "end-to-end audio-visual collaborative creation," balancing practical scenario adaptation and creative precision.

Audio-Visual Creation

4K cinematic quality

10s

Key Features

🎵
Audio-Visual Sync

Native synchronization with accurate lip-sync across languages

🎬
4K Cinematic

10s 1080P/4K HD at 24fps with rich lighting

📹
Camera Control

Advanced prompt adherence with complex camera movements

🔄
Multimodal Input

Text/image-to-video with conversational editing

Latest

Wan 2.2 Fun Control

Enhanced control and creative freedom with the latest Wan AI technology. Experience unprecedented precision in video generation.

Generation Example

Advanced motion control and style transfer

Real-time
Character Reference

Reference Character

Input

Reference Motion

Input

Generated Result

Output

Combining character style with reference motion to create personalized video content.

Character
Motion
Result

Advanced Features

  • Advanced Control
  • Improved Video Quality
  • Enhanced Creative Options
Animate

Wan 2.2 Animate

Combine static images with reference videos to generate dynamic animated videos with advanced motion control and smooth transitions.

Animation Example

Image + Reference Video to animated video

Input Image

Input Image

Input

Reference Video

Input

Generated Result

Output

Combine image and reference video to generate dynamic animated videos with smooth motion.

Image
+
Reference
Result

Key Features

  • Image + Video to video animation
  • Reference video motion transfer
  • Smooth motion control
Coming Soon

Wan Video LoRA

Specialized video adaptation using Wan AI LoRA technology. Create unique and personalized video styles with minimal training.

Specialized Features

  • Custom style adaptation
  • Fast fine-tuning capabilities
  • Efficient resource usage
  • Advanced style transfer
Coming Soon

Wan Image AI Image Generator

Qwen Text-to-Image

AI-Powered Image Generation

Natural Language Understanding

Generate images from natural descriptions in Chinese or English, supporting classical poetry to modern expressions

High-Definition Output

Ultra-detailed rendering with exceptional clarity, perfect for professional content creation

Style Control

Precise style control with simple keywords, from anime to photorealistic rendering

Qwen Image Generation Example

Example Output

Generated from natural language description

Qwen Image Edit

Precise Image Editing & Enhancement

Key Features

Smart Text Editing

Intelligent font matching and style preservation for text modifications

Object Replacement

Seamless object swapping with automatic lighting and reflection adjustment

Effect Generation

Add professional visual effects with simple brush strokes

Draw to Image Workflow

1

Select Area

Circle or mark region

2

Draw Input

Sketch your changes

3

Describe

Add text instructions

Overview of Wan AI

SOTA Performance

Wan AI consistently outperforms leading open-source models and commercial video solutions across multiple industry benchmarks.

Consumer-GPU Optimized

The Wan AI Video T2V-1.3B model requires only 8.19 GB VRAM, enabling smooth operation on mainstream consumer GPUs. It generates 5-second 480P videos in approximately 4 minutes on an RTX 4090 (without quantization), delivering performance comparable to proprietary models.

Multimodal Capabilities

Wan AI delivers exceptional results in Text-to-Video, Image-to-Video, Video Editing, Text-to-Image, and Video-to-Audio tasks, redefining intelligent video generation.

Visual Text Rendering

Wan Video introduces the first cross-lingual text generation engine for videos, supporting both Chinese and English with production-ready typography integration.

Advanced Wan-VAE Architecture

Wan-VAE achieves breakthrough efficiency in 1080P video encoding/decoding at any duration while maintaining temporal coherence—forming the core foundation for next-generation video generation systems.

Text-to-Image Generation

Wan AI's native multi-modal architecture supports text-to-image generation, empowering users to directly create high-fidelity images from descriptions for diverse creative needs.

Advanced Image Editing & Composition

Wan Image excels in sophisticated editing tasks, including modifying text within images and seamlessly composing or fusing multiple pictures. It maintains high subject consistency and produces Asian portraits with enhanced realism, ensuring outputs meet commercial-grade standards.

Wan AI Technology

Features of Wan AI

Wan Video Features

Complex Motion Generation

Wan Video models excel at generating realistic videos with large-scale body movements, complex rotations, dynamic scene transitions, and smooth cinematic camera motions. Advanced versions further enhance multi-character interaction and long-sequence motion consistency.

Realistic Physical Simulation

Wan AI accurately simulates real-world physics, including object collisions, gravity, fluid dynamics, and material interactions. Higher-tier models deliver more precise environmental responses and physically consistent animations.

Cinematic Visual Quality

Wan AI Video offers film-level visual quality with rich textures, natural lighting, depth-of-field effects, and multiple cinematic styles. Professional models unlock advanced visual effects, color grading, and stylized cinematic rendering.

Controllable Video Editing

Wan AI provides a universal video editing framework with precise controllability using image or video references. Different model versions support object replacement, motion transfer, scene restructuring, and temporal consistency editing.

Visual Text & Dynamic Typography

Wan Video can generate static and dynamic text effects directly inside videos from text prompts. Advanced models support bilingual (Chinese & English) typography, animated captions, and creative text motion effects for advertising and media production.

Wan Image Features

High-Precision Image Generation

Wan Image generates high-resolution images with accurate structure, detailed textures, and realistic lighting. Different versions support 2K–4K output, ultra-detailed realism, and artistic illustration styles.

Advanced Image Editing & Inpainting

Wan Image supports precise inpainting, object removal, detail enhancement, and content replacement. Professional versions enable pixel-level refinement and complex region-aware editing.

Style Transfer & Visual Control

Wan Image enables multi-style rendering, including realism, anime, 3D, watercolor, oil painting, and cyberpunk. Advanced models support fine-grained style strength control and cross-style fusion.

Outpainting & Image Expansion

Wan Image allows seamless image expansion beyond original boundaries while maintaining visual consistency. Higher-end models support wide-format expansion for banners, posters, and commercial layouts.

ArtAny AI & Wan AI Product Features

ArtAny AI seamlessly integrates Wan AI's powerful video and image models into a unified, user-friendly creative platform. With just a few clicks, users can generate, edit, and enhance videos, images, and audio content for marketing, social media, advertising, and professional production.

Wan AI Text to Video

Transform simple text prompts into high-quality cinematic videos with dynamic motion, realistic physics, and multiple visual styles powered by Wan Video.

Wan Image to Video

Wan AI Animate static images into vivid motion videos with smooth transitions, camera movement, and character animation using Wan Video technology.

Start & End Frame Control

Precisely control the opening and closing frames of your video to ensure visual consistency, smooth transitions, and stronger storytelling.

Wan AI Text to Image

Generate high-resolution images from text prompts with ultra-detailed realism, artistic illustration styles, and full creative control powered by Wan Image.

Image Editing & Enhancement

Wan AI Edit images with powerful tools including inpainting, object removal, background replacement, style transfer, and outpainting for professional-grade visual design.

Video-to-Audio & AI Voice

Generate background music, sound effects, and AI voiceovers directly from videos or scripts, enabling synchronized audio-visual production in one workflow.

Wan AI Video Editing & Visual Effects

Enhance videos with intelligent editing features such as object replacement, motion transfer, cinematic color grading, and stylized visual effects.

Wan AI Open Source Release

Alibaba has officially announced the community open-sourcing of the code and weights for both the Wan 2.1 and Wan 2.2 versions via this repository. Wan AI is a comprehensive and open suite of video foundation models, specifically designed to push the boundaries of video generation and empower the developer and research communities.

Wan 2.2 Open-Source-Modelle

Wan2.2 represents a major upgrade to the Wan video foundation models, delivering significant improvements in architecture, visual quality, motion realism, and high-definition generation efficiency.

Key highlights include:

MoE Architecture for Higher Model Capacity

Wan2.2 introduces a Mixture-of-Experts (MoE) structure into video diffusion, enabling larger effective model capacity without increasing computational cost.

Cinematic-Level Aesthetic Control

With carefully curated aesthetic datasets labeled by lighting, composition, contrast, and color tone, Wan2.2 enables highly controllable cinematic-style video generation.

Stronger Complex Motion Generation

Trained on substantially larger datasets (+65.6% images, +83.2% videos vs. Wan2.1), Wan2.2 achieves top-tier performance in motion realism, semantic accuracy, and aesthetic quality.

Efficient 720P Hybrid Text & Image to Video (TI2V)

The open-sourced 5B model with Wan2.2-VAE supports both Text-to-Video and Image-to-Video at 720P, 24fps, runs on consumer GPUs like RTX 4090, and ranks among the fastest HD video models available.

Advanced I2V-A14B Image-to-Video Model

Built with MoE architecture, the I2V-A14B model supports 480P and 720P I2V generation with more stable motion, fewer unrealistic camera movements, and stronger performance for stylized scenes.

Wan2.2 T2V-A14B

Wan2.2 I2V-A14B

Wan2.2 TI2V-5B

Wan2.2 S2V-14B

Wan2.2 Animate-14B

Wan 2.1 Open Source Models

Wan2.1 is a comprehensive and open suite of video foundation models that significantly advances the capabilities of Wan AI Video Generator.

Key highlights include:

State-of-the-Art Performance

Wan2.1 achieves top-tier performance across multiple benchmarks, outperforming most open-source video models and rivaling leading commercial solutions.

Consumer GPU Compatibility

The T2V-1.3B model runs on as little as 8.19 GB VRAM, enabling high-quality video generation on mainstream consumer GPUs such as the RTX 4090.

Full-Stack Multi-Task Support

Wan2.1 supports Text-to-Video, Image-to-Video, Video Editing, Text-to-Image, and Video-to-Audio, delivering a complete multimodal video generation pipeline.

Bilingual Visual Text Generation

As the first video model capable of generating both Chinese and English on-screen text, Wan AI 2.1 expands real-world creative and commercial use cases.

High-Performance Wan-VAE

Wan-VAE enables efficient encoding and decoding of 1080P videos of any length while preserving temporal consistency, serving as a robust foundation for video and image generation.

T2V-14B Flagship Model

The T2V-14B model sets a new SOTA benchmark across open and closed models, excelling in dynamic motion generation and supporting 480P and 720P bilingual video output.

Wan2.1 T2V-1.3B

Wan2.1 T2V-14B

Wan2.1 I2V-14B

Wan2.1 FLF2V-14B

Wan AI VACE

Wan 2.6 has been officially released

Bringing a major leap forward in AI video generation

15-Second Long-Form Video Generation

Unlock extended creative storytelling possibilities for creators, filmmakers, and marketers with 15-second long-form video generation.

LoRA Fine-Tuning Support

Customize characters, styles, and motion behaviors with lightweight training—making personalized AI video creation faster and more accessible than ever.

Enhanced Character Consistency

Greatly strengthened character consistency, ensuring stable identities, facial features, and motion continuity across longer video sequences.

Native AI Music Generation

Wan AI music generation will be natively integrated, allowing seamless synchronization of visuals and sound within a single creative workflow.

Wan AI Frequently Asked Questions

1

What is Wan Video by Wan AI and how does it work?

Wan Video is a state-of-the-art video generation system developed under the Wan AI framework. It transforms text or image inputs into high-quality videos using advanced technologies such as Variational Autoencoders (VAE) and Diffusion Transformers (DiT), delivering realistic motion, cinematic visuals, and accurate physical behavior.

2

Do I need technical expertise to use Wan AI?

No technical background is required. Wan AI is designed with a user-friendly interface that allows beginners and professionals alike to generate high-quality videos easily without coding or complex configuration.

3

What types of videos can I create with Wan Video?

Wan Video supports a wide range of video content, including character animation, dancing, sports, cinematic storytelling, educational content, marketing videos, historical restoration, and stylized creative scenes.

4

How long does it take to generate a video by Wan AI?

Video generation time depends on resolution, duration, and motion complexity. Higher-performance versions of Wan AI offer faster processing speeds for time-sensitive production needs.

5

Can I customize the video output with Wan AI?

Yes. Wan Video allows flexible control over resolution, frame rate, motion intensity, camera movement, visual style, and more—giving you full creative control over the final result.

6

What input formats does Wan Video support?

Wan Video currently supports text-to-video and image-to-video generation. Users can provide detailed text prompts or reference images to guide scene composition, motion, and visual style.

7

Does Wan AI support multilingual video generation?

Yes. Wan AI supports multilingual text prompts, including English and Chinese. Video content and on-screen visual text can be generated based on different languages depending on the selected model.

8

Is there a limit to the length of videos generated by Wan AI?

Video length limits depend on the platform plan and model version. Entry-level access may have shorter duration limits, while advanced plans support longer, more complex video generation.

9

How does Wan Video ensure high-quality output?

Wan Video leverages advanced VAE and DiT architectures, large-scale training datasets, and optimized motion modeling to ensure cinematic visuals, smooth transitions, realistic physics, and stable temporal consistency.

10

How does Wan Video handle complex scenes with multiple characters?

Wan Video analyzes character relationships, spatial positioning, and motion interactions from the input prompt, ensuring natural movement, realistic interactions, and consistent multi-character behavior.

11

What open-source models are currently available from Wan AI?

Wan AI has open-sourced multiple models, including high-definition Text-to-Video and Image-to-Video models, as well as specialized MoE-based architectures for stable motion generation and stylized video synthesis.

12

What other open-source AI models has Alibaba Cloud released related to Wan AI?

Alibaba Cloud has released a broad ecosystem of open-source AI models, including Qwen large language models, multimodal vision-language models, image generation models, and audio generation systems—forming a complete multimodal AI infrastructure alongside Wan AI.

13

What is Wan Image by Wan AI and what can it be used for?

Wan Image is the image generation and editing system under the Wan AI framework. It supports text-to-image creation, high-resolution visual rendering, commercial-grade design output, and creative illustration across advertising, e-commerce, branding, gaming, and digital art production.

14

Does Wan Image support professional image editing and style control?

Yes. Wan Image supports advanced image editing features such as inpainting, outpainting, object removal, background replacement, super-resolution enhancement, and multi-style transfer. Users can precisely control realism, artistic styles, lighting, and composition for professional creative workflows.